Edge usage, motifs and regulatory logic for cell cycling genetic networks 
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The cell cycle is a tightly controlled process, yet its underlying genetic network shows marked 
differences across species. Which of the associated structural features follow solely from the ability 
to impose the appropriate gene expression patterns? We tackle this question in silico by examining 
the ensemble of all regulatory networks which satisfy the constraint of producing a given sequence 
of gene expressions. We focus on three cell cycle profiles coming from bakers yeast, fission yeast and 
mammals. First, we show that the networks in each of the ensembles use just a few interactions 
that are repeatedly reused as building blocks. Second, we find an enrichment in network motifs that 
is similar in the two yeast cell cycle systems investigated. These motifs do not have autonomous 
functions, but nevertheless they reveal a regulatory logic for cell cycling based on a feed-forward 
cascade of activating interactions. 

PACS numbers: 87.16.Yc, 87.18.Cf, 87.17.Aa 
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I. INTRODUCTION 

The cell cycle - biomass accumulation, DNA replica- 
tion and cell division - is at the heart of life. Disruptions 
in this cycling, due to environmental changes or genetic 
defects, can lead to cell death or to uncontrolled prolifera- 
tion such as in tumorigenesis [l[ . It is thus not surprising 
that the cell cycle is tightly controlled in all organisms, 
allowing a stereotyped sequence of events to progress in 
an orderly and timely fashion. Various molecular types 
participate in this orderly progress, but each species has 
its own specific regulatory machinery 0, Q . 

Detailed investigations along with reconstructions of 
network interactions have allowed the building of quan- 
titative cell cycle models based on ordinary differential 
equations (ODEs), in particular for budding yeast Q, 
fission yeast (BJ and mammals @. The mathematical 
complexity of these systems is very high. To render them 
more comprehensible, simplifications have been proposed 
while preserving most of the qualitative aspects of their 
dynamics Q- 

Reducing complexity still further, a number of authors 
have provided simplified models by discretising time and 
by replacing the continuous concentrations of molecular 
species by presence/absence (binary) variables. Such a 
Boolean approach has a long history going back to the 
pioneering work of Kauffman H| (see also the book [9] 
and references therein) and since then it has been used 
in quite a number of systems. In particular it has been 
applied to the cell cycle in the three previously men- 
tioned species (Tol - [T3 | with a fair amount of success. This 
Boolean approach is complementary to that based on 
ODEs: it is much less powerful quantitatively but it al- 
lows for simpler model construction and interpretation, 
leading to qualitative insights into the generic aspects of 
regulatory rules. The present work is of a similar vein, 



but uses a more realistic model of molecular interactions 
and focuses on the topology of regulatory networks. 

Adjusting a dynamical model to reproduce observed 
cell cycle dynamics can be challenging but it is also 
an under-specified problem: because of the plethora 
of parameters, there are generally many different solu- 
tions [IH . Certainly, nature takes advantage of this free- 
dom, but a consequence is that related species can have 
quite different regulatory circuits, impeding attempts to 
extract common regulatory principles. To overcome this 
difficulty, we have taken an in silico approach that char- 
acterizes all possible regulatory circuits that are com- 
patible with given cell cycle expression dynamics. This 
route allows us to determine which features are always 
or mostly present in these in silico circuits and it also 
makes possible more meaningful comparative studies of 
different biological networks. 

The present work builds on regulatory models devel- 
oped in refs. [l^, [l7[ where we revealed several conse- 
quences of toy expression constraints on network struc- 
ture. Here, we use the actual expression patterns found 
in three biological systems. Our modeling of the regu- 
latory networks uses thermodynamic considerations to 
describe the underlying molecular interactions. Inter- 
estingly, the actual number of effective inter-molecular 
interactions becomes an emergent feature, driven by bio- 
physical constraints coupled to an appropriately defined 
fitness function mimicking the selection pressure operat- 
ing in nature. Based on this choice of genetic regulatory 
network (GRN) modeling, we sample by Markov Chain 
Monte Carlo (MCMC) the space of all GRN that pro- 
vide the proper gene expression patterns. We do so for 
the three ensembles of GRN where the gene expression 
profiles are taken from budding yeast, fission yeast and 
mammals derived in refs. fiol Hz . H3 |. For each, we ex- 
tract the structural properties of the sampled GRN, from 
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edge usage to the presence of network motifs. Finally, 
by comparing to the case of an idealized cell cycle, we 
reveal an underlying common regulatory logic in which 
feed-forward activating cascades play a central role. This 
logic is apparent when the associated motifs are exter- 
nally driven, but the motifs on their own do not imple- 
ment autonomous dynamics. 

In the next section we briefly sketch the main aspects 
of our model, stressing the changes from our previous 
work [l6|, [ljj to deal with new aspects of the problem. In 
section III we present our results. We conclude in section 
IV. 



II. MODELS AND METHODS 



A. Biophysical modeling of molecular interactions 



B. Time dynamics of expression levels 

Refs. fllil Kt\ considered transcriptional regulation; we 
have to generalize the framework to any kind of regula- 
tion because of the different molecular types driving the 
cell cycle. Let Sj(t) £ [0,1] (j = 1,...JV) denote the 
average level of expression of the molecular species j at 
time t, normalized by its maximum value. The N expres- 
sion levels {Sj(t)} will be referred to as the phenotype at 
time t. We are interested in processes of the type 



{S,-(to)}-> {£,-(*!)} 



{S 3 (t M )} 



(2) 



that must reproduce as well as possible a target cell cycle 
sequence, i.e., the sequence deduced from experimental 
measurements. For example, in the case of the fission 
yeast S. Pombe, this sequence pattern is given in Table U 
as specified in ref. (l2l |. 



In previous work [lQ, |l7[ we developed a biophysical 
framework for modeling the regulatory interactions be- 
tween a transcription factor (TF) and its DNA binding 
site. For cell cycling systems, a number of the actors, 
such as cyclins, are not TFs, so our previous framework 
must be generalized. The starting point is a network 
whose nodes and edges refer to molecules and activa- 
tion/repression processes, be-they transcriptional, post- 
transcriptional, translational or post-translational. All 
interactions involve specific binding sites or surfaces that 
allow two molecules to physically establish contact and 
bind through the addition of small forces. Almost all 
of the facing elements (atoms, bases, amino acids, ...) 
have to "match" for the two molecules to bind. Fol- 
lowing ref. [LS|] , we consider that each of the facing re- 
gions is specified by an ordered lists of symbols that can 
be thought of as a string or a table characterizing that 
type of molecule and its site or surface dedicated to that 
binding partner. Then the mutual binding energy of two 
molecules is taken to be additive in the number of mis- 
matches between their strings, each mismatch contribut- 
ing a penalty e. 

Such a framework is easily justified for TF-DNA bind- 
ing energies and leads to a thermodynamic formula for 
the probability of occupation of a given DNA binding site 
in the presence of n,j TFs of type j [l!| : 



Start SK Cdc2 Cdc25 Cdc2* Slpl PP Ste9 Ruml Weel 



P — 



1 



1 + 1/(7^) 



where Wij = e 



-edi 



(i) 



TABLE I. The cell cycle expression profiles of fission yeast 
(from 12]). Time runs from the top to the bottom. Succes- 
sive, idealized expression levels, are either or 1, at each time 
step. We keep the terminology from [l^]: the initial state's 
expression is given by the first line of the table (referred to 
as a "START" state) and the last state is a fixed point as- 
sociated with the cell size check point. When a critical cell 
size is reached, a signal (implemented by having a so-called 
"Start" node turn on) triggers the cycle again. The Cdc2 
(Cdc2*) stands for Cdc2/Cdcl3 (Cdc2/Cdcl3*), where the * 
sign indicates the highly activated form of the complex [L^ ]. 
Similarly Weel stands for the Weel/Mikl kinases. 

Following refs. 0, [13], the discrete time dynamics for 
the expression levels are based on a mean field approx- 
imation which neglects cooperative binding effects. Ex- 
plicitly, we have 

Si(t + 1) = {1- Ill 1 - ^WDlTt 1 - *V(*)] ( 3 ) 



Here dy is the number of TF-DNA mismatches. The where the P i3 were defined in Eq.([T|). In the present 



derivation of Eq. (p} in ref. [2(| shows that it is of rather 
general validity, so we shall apply it to all of the molecular 
species in our cell cycle system. 

Note that Py depends strongly on dij and is appre- 
ciable only when dij is small. With a realistic choice of 
model parameters, a small dij is a priori very unprob- 
able, so that a functional molecular contact is also very 
unprobable. 



expression, j runs over activating and j' over inhibitory 
interactions respectively. It is easy to see that Eq.Q 
embodies a simple logic for each Sf. (i) each activator or 
repressor acts independently; (ii) the binding of at least 
one activator is required for activation of the target; (iii) 
the binding of even a single repressor is sufficient to veto 
the activation and so will turn off the expression of the 
target. For the sake of simplicity we set rij — nSj(t) 
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in Eq.([T]), with a value of n common to all molecular 
species. For our simulations, we set n = 1000, but we 
have checked that the results presented depend little on 
the magnitude of this parameter. Thus the fluctuations 
in the number of molecules of a given type will not matter 
much, further justifying our mean field approach. 

Notice that Eq.([T]) holds provided the system is in 
equilibrium. Thus, in writing Eq.Q it is assumed that 
the binding-unbinding reaction is fast compared to that 
for full activation/repression. However, the rate of a 
binding-unbinding reaction is proportional to the concen- 
tration of reactants and the equilibration time becomes 
very large for small concentrations. To avoid this lim- 
itation, attempts to generate very small concentrations 
should be ignored. Furthermore, the unphysical assump- 
tion that there is always enough time to reach equilibrium 
sometimes leads to pathologies. In particular, the system 
suffers from an instability: even a relatively weak inter- 
action has an a priori capacity of generating a spurious 
progressive amplification of physically unsignificant (be- 
cause very low) expression levels. To avoid this bad be- 
havior and improve the robustness of the model, we have 
recourse to a phenomenological correction, introducing a 
threshold H to modify Eq.© whenever the expression 
level is too small: 

Si(t + 1) = S mm when Si(t + 1) < H (4) 

In effect, S min can be thought of imposing a basal rate 
or some level of leakiness on the transcription while the 
threshold H ensures that expression levels stay low unless 
the input signal is sufficiently activatory. In practice we 
set S mm = 1/n = 0.001 as if there were just one molecule 
of that species and H is set to H = 0.01. Then, Si(t) 
can leave the minimal level only when at least one dij is 
small enough to have a dynamical significance [l7j . 

C. Computational implementation 

For each i and each j, (i,j), the interaction strength 
Wij is given by Eq.flt where dij can be considered to 
be the mismatch [18[ between two strings of length L. 
In the simulation, rather than storing the 2 character 
strings for each such interaction, we simply store the bi- 
nary difference string (also of length L) specifying which 
entries match (1) and which mismatch (0). There is also 
a sign associated with each interaction: + for an acti- 
vator, - for a repressor. Mutations can change the sign 
of an interaction or they can transform a match into a 
mismatch and vice versa. Following the procedure used 
for TFs [l|| [I?]], and to keep the modeling simple, we 
consider that the original strings use a four letter code. 
Then the probability that a mutation replaces a match 
by a mismatch is 3/4, while the probability that a mis- 
match is converted into a match is 1/4, embodying the 
fact that it is easier to have a mismatch than a match. 
(Note that even for protein-protein interactions, this rule 
can be motivated by the fact that molecular changes are 



often the result of a point mutation at the DNA level.) 
We will refer to the matrix dij and to the associated set 
of signs as the genotype. Typically dij ^ dji since these 
two numbers of mismatches correspond to different pro- 
cesses: the former to the activation of i by j, while the 
latter to the activation of j by i. In these two processes 
different active sites have to match. 

For a given genotype, the dynamical system modeling 
the cell cycle is initialized in the "START" state and then 
the trajectory is generated by iterating Eq.Q. At each 
time step we compute the distance D(t) 

N 

D{t) = Y J \Si{t)~Sl a ^ t {t)\ (5) 

i=l 

between the vectors of expression levels associated with 
the actual sequence of the dynamical system at hand and 
the target one (for fission yeast, the target sequence is 

given in Table [TJ) . The total distance Dt = J2t=o ^(*) * s 
then used to define the fitness F of the genotype via 

F = e- fDT (6) 

where the parameter / gives the strength of the selection 
pressure to maintain "good" expression profiles. Indeed, 
deviations from the target expression profile are likely to 
be deleterious for the proper functioning of a cell and 
its progeny. Note that this fitness defines a weight for 
each genotype and thus an ensemble where fitness plays 
the role of a Boltzmann factor. Finally, to sample geno- 
types according to their fitness, we use Markov Chain 
Monte Carlo (MCMC). The fitness enters the associated 
Metropolis test which accepts or rejects a trial change 
(mutation) to the genotype. We always start with a com- 
pletely random network and thermalize it until it has a 
good fitness. Then, we produce a long sequence of sev- 
eral thousand GRN genotypes, each being separated by 
200 sweeps (a sweep is a series of N 2 L attempted muta- 
tions and sign flips). To deal with possible fragmentation 
of the search space we checked that the statistical 
properties of the resulting genotypes were the same for 
several independent simulations, i.e. each simulation be- 
ing initialized with a different random genotype. 

The parameters values used in our computations are 
L = 12, e = = 20, H = 0.01 and n = 1000. 

Compared to ref. [17j . H is a new parameter, e is slightly 
smaller while the remaining are the same. Interestingly, 
it is essential that L and e be such that the a priori 
probability of a small mismatch is very low. Once that 
constraint is taken into account, the model results are 
robust with respect to the variation of the parameters. 

The Metropolis test described above forces Dt to be 
relatively small if / is large, but the MCMC is inefficient 
in that regime, forcing us to work with not too large val- 
ues of /. Then in the T x N table of numbers Si(t), there 
may appear a few that are "ambiguous" , that is that are 
away from the target value which in this study is either 
or 1. For example an expression level may have the 
value 0.3 in a place where the target value is 0. Forcing 
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oaker s 


Fission 


Mammalian Idealized 


E w t 


29 


23 


22 




{Eess} 


27.98(2) 


20.96(2) 


13.92(1) 


17.62(2) 




12.20(2) 


9.02(2) 


5.30(2) 


6.78(1) 


(Dr)/T 


0.534(1) 


0.493(1) 


0.491(1) 


0.358(1) 


{Son) 


0.881(1) 


0.861(1) 


0.879(1) 


0.920(1) 


(Soff) 


0.011(1) 


0.014(1) 


0.026(1) 


0.006(1) 



0.3 



I" 0.2 



0.1 



0.0- 



Baker's yeast 



=d 



Wild-type 
network 



24 



2X 



32 



0.3 



0.2 



36 



0.0 



Number of essential interactions 



Fission yeast 



Wild-type 
network 



18 20 22 24 26 
Number of essential interactions 



TABLE II. Statistics from networks associated with the cell 
cycle expression patterns of baker's yeast, fission yeast, mam- 
mals, and with an idealized cell cycle expression pattern. E w t 
stands for number of interactions (edges) in the wild-type net- 
work. The other symbols refer to properties of in silico net- 
works generated by MCMC: E ess is the number of essential 
interactions; E rep is the number of inhibitory interactions; Dt 
is the total distance between the actual expression patterns 
and target one (nearly constant and ~ 0.5 when divided by 
the number of steps of the cycle); Son {Soff) is the expres- 
sion level of genes that are ON (OFF) in the target phenotype. 

a better agreement by increasing / is not computation- 
ally feasible. To overcome this difficulty, from our large 
MCMC sample, we have selected genotypes where such 
ambiguities are absent, imposing that expressed (respec- 
tively unexpressed) molecular types have in fact levels 
above 0.6 (respectively below 0.2). This selection is done 
mostly for esthetic reasons, the properties of the full sam- 
ple are essentially the same because of the low frequency 
of these ambiguities. 



III. RESULTS 
A. Essential interactions 

The networks generated by the MCMC are character- 
ized at a microscopic level by weights Wij quantifying the 
regulatory interaction strength of a molecule of type j on 
the expression level of molecules of type i. Just as had 
been found in our previous work [l6l , most Wij re- 
main negligible, but a small fraction have values that are 
far above the background level. These interactions are 
the ones important for reproducing the target sequence 
of expression states. We formalize this property as fol- 
lows: if setting Wij = causes the Metropolis test to 
reject the change 5 times in a row, we say that the inter- 
action Wij is "essential" . The set of essential interactions 
of a network allows for a summary representation that is 
well suited for visualising the influences of each molecular 
type. A genotype's set of essential interactions defines a 
directed network, the essential network, from which one 
may extract informative regulatory patterns. 

It is well known that biological GRNs are sparse. As 
shown in refs. |16l.ll7j|. our model very naturally generates 
sparse essential networks because relevant - i.e. small - 
mismatches are a priori unprobable and arise only under 
strong selection pressure. Then simply because of en- 



FIG. 1. Number of essential interactions in regulatory net- 
works generated within the baker's (left) and fission (right) 
yeast ensembles. In both cases the number of interactions 
measured in the wild-type networks [l(| 03] is close to the 
position of the peak in the ensemble's distribution. 



tropy, the model strongly favors low numbers of essential 
interactions. However, this argument is qualitative only. 
Let us thus compare quantitatively the numbers of edges 
produced within the model to the numbers in the biolog- 
ical data as inferred by the different authors [ToL [l2T fl3|. 
Hereafter, we will refer to those reconstructed networks 
as the "wild- type" networks. In Table |H] we give the 
average number of essential interactions along with the 
value in the wild-type network for baker's yeast [Tcj, fis- 
sion yeast [HI and mammals [l3j . (Note: we do not 
count the "self-degradation" interactions added by hand 
by some of these authors which are specific to their mod- 
eling of the discrete time dynamics of the expression lev- 
els.) The mammalian wild- type network appears to use 
significantly more interactions than our model but the 
two yeast species seem to be in good agreement with our 
model. To quantify this, we have measured the fluctua- 
tions in each ensemble. As shown in Fig. [TJ for the two 
yeast ensembles the distributions of the number of edges 
are bell shaped and narrow, so in fact the fluctuations are 
very mild. Furthermore, the difference between the wild- 
type numbers and the expected value in these ensembles 
are not much more than one standard deviation, a re- 
sult that is quite non-trivial. Most of results presented 
hereafter concern the two species of yeast. The fact that 
our model does much better for yeasts than for mam- 
mals may not be a coincidence. Indeed, although baker's 
yeast and fission yeast are highly divergent evolutionarily, 
much more so than any mammals amongst themselves, 
they are both uni-cellular organisms. It is well known 
that the regulatory control of gene expression is intrin- 
sically more complex in multi-cellular organisms, so a 
posteriori one may draw some satisfaction from the fact 
that our simple model seems to do well when regulation is 
simple but does much less well when regulation involves 
very complex mechanisms. In the remaining subsections 
we will omit results referring to the mammalian cell cy- 
cle. It is sufficient to say that they are systematically off 
the data. 
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FIG. 2. Frequencies of activators (top) and inhibitors (bot- 
tom) for networks produced in the ensemble relevant for the 
cell cycle in baker's yeast. Array element corresponds to 
the probability of finding a link i <— j in the analysed ensem- 
ble of GRNs. Dark grey highlighted entries have frequencies 
higher than 90%; a thick solid (respectively thick dashed) 
frame indicates that the link is present (respectively absent) 
in the yeast wild- type network of ref. [HJ] (see also Fig. [5|. 
Light grey fields indicate the other activators/repressors in 
the wild-type network. 



B. Edge usage 

Now consider the edge usage in the essential networks. 
The frequency with which there is an essential inter- 
action from node j to node i can be represented by a 
square matrix with entries between and 1. We have 
computed these matrices for the three cell cycles, and 
in Fig. [5] we display the results separately for activat- 
ing and inhibitory interactions in baker's yeast. Partic- 
ularly, there are 9 activators (7 common with the wild- 
type) and 7 repressors (4 common with the wild-type) 
almost always present in the ensemble for baker's yeast 
(c/. dark grey entries in Fig. [2]). Further, the results can 
be compared with frequencies of activatory/inhibitory in- 
teractions from [ll|, where 6 activators and 4 repressors 
(all common with the wild-type) are found to be abso- 
lutely required for a network to produce the cell-cycle 
process. Specifically, among these absolutely required 
interactions, 8 edges are almost always present and two 
edges are present in more than 80% of networks in our 
ensemble. 
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FIG. 3. Frequencies of activators (top) and inhibitors (bot- 
tom) for networks produced in the ensemble relevant for the 
cell cycle in fission yeast. The conventions for highlighting 
the values are the same as in Fig. [2] but here refer to the wild- 
type network of fission yeast as specified in ref. [l^] (see also 
Fig.©. 

On a more general level there is a manifest dichotomy: 
certain interactions are very rarely if ever present, while 
others are often if not always present. Furthermore, there 
is a clear structure to the matrix: in the line just below 
the main diagonal, activating interactions are very fre- 
quent. A different pattern arises for the inhibitory in- 
teractions: there, all the entries near the diagonal are 
always absent, and elements 4 or 5 steps shifted from the 
diagonal are frequently or always present. 

Qualitatively the same pattern arises in fission yeast 
but it is noisier (Fig. [3]), and in the case of the mammalian 
cell cycle which involves only 7 genes, there are simply 
remanants of these patterns. 

C. Overlaps with the wild- type networks 

If an essential interaction of an in silico GRN is present 
in the wild-type network, we shall shall call this interac- 
tion a "hit" . What fraction of the essential interactions 
of a GRN are hits? If the fraction is close to 1, there is a 
very strong overlap between that GRN and the wild-type 
network. As can be seen in Fig. 21 in practice the fraction 
varies from network to network but typically takes values 
in the range from 50% to 70% in both yeast species. The 
mean fractions are 65% and 57% for fission and baker's 
yeast so clearly there is a strong overlap between the in 
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FIG. 4. Histograms for the fraction of essential interactions 
that are "hits" in the baker's (left) and fission (right) yeast en- 
semble. We see that networks in our ensembles have roughly 
50% to 70% of interactions in common with the wild-type 
networks. Notice, that the wild-type networks are not part of 
our ensemble. 



silico and wild-type networks. The highest fraction of 
hits found in our MCMC is 73% for baker's yeast and 
76% for fission yeast. Notice that this fraction is close to 
the overlap of hits with respect to all links in the wild- 
type network, since in our model E wt and E eas are similar 
for both yeast species (see Tab. fTTj) . To compare visually 
the associated networks to the corresponding wild-type 
networks, we display them in Fig. [5] for baker's yeast and 
in Fig. [5] for fission yeast. The main differences arise for 
the Sicl and Clbl,2 genes which in the wild- type have 
higher degree and more mutually inhibitory interactions 
than in the GRN generated by MCMC. Such interaction 
pairs are quite natural in the context of protein-protein 
interactions and thus probably reveal a shortcoming of 
our modeling framework. 



D. A multitude of different essential networks 

The MCMC generates a very large number of geno- 
types from which we construct essential networks. In 
effect, each essential network corresponds to a different 
way interactions can be specified so as to reproduce the 
target gene expression dynamics. Many genotypes give 
rise to the same essential network, but are there many 
different essential networks? The answer of course is yes, 
we find hundreds of different such networks. One may 
then ask whether some arise much more frequently than 
others. On the right hand side of Fig. [5] (respectively 
Fig. [BJ we show the most frequently found essential net- 
work for baker's yeast (respectively fission yeast); these 
arise with frequencies between 2 and 3%. Their char- 
acteristics are similar to those of the essential networks 
having the highest overlaps with the wild-type networks. 
These most frequent essential networks arise hundreds of 
times more frequently than the rarest ones. In Fig. [7] we 
show the associated rank histogram for the concurrence 
frequencies on a log-log plot. The fat tail in this distri- 
bution is roughly compatible with a power law. Such a 
shape for a rank histogram arises in a number of other 
systems, in particular in neutral networks where many 



genotypes give rise to the same phenotype. 



E. Network motifs 

In the terminology of Alon [2l|, a network motif in 
a (biological) graph is a subgraph that is present at a 
higher frequency than expected at random. The term 
"at random" is generally denned using an ensemble of 
graphs where each node is constrained to have the same 
in and out degree as in the biological graph of interest. In 
practice this ensemble is generated using a randomization 
of the edges of the biological graph H3 and that is 
what we have done too. 

Given the genotypes in each of our ensembles, pro- 
duced by the MCMC, we have extracted the motifs 
present in the associated essential networks. The differ- 
ent motifs we find are represented graphically in Fig. [5] 
For two nodes, we find no significantly over expressed 
motifs. For three nodes, we find that for both yeast cell 
cycle ensembles several of the coherent feed-forward loops 
are over represented. In particular for the baker's yeast 
ensemble, these over-representation factors are close to 
10. Finally, for four nodes, we find the presence of nu- 
merous motifs: two kinds of incoherent diamonds, two 
kinds of frustrated four-point loops and incoherent bi- 
fans. The detailed frequencies of each of the different 
motifs in these ensembles are given in Tab. Mil 

Finally, instead of working within the in silico GRNs 
produced by our ensembles, one can check for motifs in 
the wild-type networks. We find that the motifs are not 
all the same as in the associated ensemble. Among the 
strongest differences, the RR motif consisting of two mu- 
tually inhibitory nodes is strongly over-represented in the 
wild-type networks but not in the in silico networks. Dif- 
ferences exist also for a number of other motifs. For 
example, in the wild-type network of baker's yeast, the 
numbers are: motif RR = 3; A (Cl-FFL) = 2, C (C4- 
FFL) = 1, D (neg 3-loop) = 2. In the case of fission 
yeast, the numbers are motif RR = 4; diamonds: G = 2, 
H = 2; bi-fan: 1=1. These numbers can be compared 
to those in the GRN ensemble using Tab. IIIII 



F. Regulatory logic 

The pattern of edge usage obtained in Fig. [5] depends 
on the order of the genes. The order we chose makes the 
successive expression states resemble as much as possible 
a left to right shifting block of l's. As is visible in TableQ] 
even with this "best" choice, irregularities remain, sug- 
gesting one consider an idealized case. We thus replaced 
the irregular shifting block by a regular one and then 
repeated our methodology on these idealized cell cycles. 
Interestingly, the associated essential networks have edge 
usage and motifs similar to the ones previously described 
for the wild-type cell cycles. Furthermore, the activating 
interactions typically form one long feed forward cascade. 
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FIG. 5. Left: The wild-type regulatory network for the baker's yeast cell cycle has 29 links, of which 15 are activating (solid) 
and 14 are inhibitory (dashed). Compared to ref. [l(| we do not include the added self-degradation links. Middle: Regulatory 
network in the MCMC ensemble with the highest fraction of hits (73% of the essential interactions are present in the wild- type). 
This network has 19 hits (black) and 7 non-hits (light grey) for a total of 26 essential interactions. Right: The most frequent 
network in the baker's yeast ensemble (2.6% of all GRNs, 16 hits (black) and 10 non-hits (light grey) for a total of 26 essential 
interactions) . 




FIG. 6. Left: The wild-type regulatory network for the fission yeast cell cycle has 23 links, of which 8 are activating (solid) 
and 15 inhibitory (dashed). Compared to [l2l ] we do not include the added self-degradation links. Middle: Regulatory network 
in the MCMC ensemble with the highest fraction of hits (76% of the essential interactions are present in the wild-type) . The 
network has 16 hits (black) and 5 non-hits (light grey) for a total of 21 essential interactions. Right: Most frequent network in 
the fission yeast ensemble (2.1% of all GRNs, 14 hits (black) and 6 non-hits (light grey) for a total of 20 essential interactions). 



To illustrate this phenomenon, we show in Figure [5] the 
most frequently found essential network when idealizing 
the target sequence of Table Q] All activating (non-self) 
interactions follow each other from "START" to the last 
expression state. The function of such a cascade is clear 
but if one extracts an associated motif, say three consec- 
utive activating interactions, it has no autonomous dy- 
namics. To understand the functioning of such a motif, 
one has to externally drive it, providing inputs and initial 
conditions that will initiate a "block" of on genes. The 
motif will then propagate as a cascade of falling domi- 
noes [24[ until the block of l's is pushed out and the 
whole region consists of 0's. 



IV. DISCUSSION AND CONCLUSION 

It is quite remarkable that a model, based on rather 
elementary thermodynamic and probabilistic principles, 
succeeds in reproducing much of the cell cycle network 
topology found in both baker's yeast and fission yeast, 
two highly divergent organisms. Our in silico approach is 
based on considering all possible networks subject to the 
constraint that they give rise to the same gene expres- 
sion patterns as the experimentally observed systems. 
In effect, this approach allows one to infer how network 
regulatory architectures are constrained by the network 
"function" . Within this framework, our modeling was 
able to give a good indication of the number of interac- 
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FIG. 7. Frequency in the MCMC of the different essential 
networks for three ensembles: baker's yeast (squares), fission 
yeast (circles) and idealized cell-cycle (triangles). In all cases 
there are a few most frequent networks and the distribution 
of frequencies for the rarer networks roughly follows a power 
law. 
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FIG. 8. Most prominent network motifs found from our GRN 
ensembles. Corresponding frequencies are given in Tab. IIIII 



tions and of the actual edges that are used in the two 
yeast species. We also considered higher order features 
such as network motifs; this allowed us to reveal an ubiq- 
uitous design principle in the form of an activatory cas- 
cade that is realized perfectly within an idealized cell cy- 
cle and is manifest in the less regular cell cycles studied 
here. 

However in the case of the mammalian cell cycle net- 
work our results are less impressive to say the least. As 
already mentioned, this is not really a surprise. In view of 
the simplicity of the model, its failures are not devoid of 
interest: they indicate where a more sophisticated mod- 
eling is necessary. It is reassuring that the model does 
better for uni-cellular organisms than multi-cellular or- 
ganisms which have far more levels of regulatory control; 
the contrary would be suspect! 

We were seeking to determine which aspects of GRN 
structure are expected from simple modeling and argu- 



Motifs 


Baker's yeast Fission yeast Idealized cycle 


A (Cf-FFL) 


1.04(2) 


0.0075(11) 


0.026(1) 


randomized 


0.30(f) 


0.162(7) 


0.149(4) 


B (C3-FFL) 


1.93(2) 


0.23(1) 


0.050(2) 




0.22(f) 


0.094(5) 


0.073(2) 


C (C4-FFL) 


1.93(2) 


0.21(2) 


0.279(4) 




0.22(f) 


0.099(5) 


0.072(2) 


D (neg 3-loop) 


0.80(1) 


0.091(6) 


0.0021(4) 




0.34(f) 


0.258(8) 


0.284(5) 


E (neg 4-loop) 


0.81(2) 


0.063(5) 


1.97(1) 




0.061(3) 


0.023(3) 


0.029(2) 


F (neg 4-loop) 


0.98(2) 


0.023(3) 


2.27(2) 




0.065(3) 


0.020(2) 


0.029(2) 


G (diamond) 


0.41(2) 


0.098(5) 


0.39(1) 




0.045(3) 


0.030(4) 


0.036(2) 


H (diamond) 


0.48(2) 


0.050(4) 


1.49(1) 




0.071(3) 


0.050(3) 


0.111(3) 


I (bi-fan) 


0.10(1) 


0.0037(9) 


0.73(1) 




0.023(3) 


0.0035(8) 


0.0056(5) 



TABLE III. Frequencies of motifs found in regulatory net- 
works generated with baker's yeast, fission yeast and idealized 
cycle target patterns. For each motif, the first line provides 
the frequency in the MCMC ensemble and the second line 
provides the frequency in the randomized networks. Numbers 
in bold indicate motifs for which the frequency in the GRN 
ensemble is at least two times higher than in randomized en- 
semble. Motifs corresponding to the symbols are shown in 
Fig. El 




FIG. 9. Most frequent essential network for the idealized cycle 
with 11 nodes. It arises in 6.1% of all GRNs. 



ments of a generic nature. Hence, we have assumed that 
all interactions are a priori possible and only allowed the 
expression phenotype to constrain the networks. But it 
is clear that ignoring chemistry in a biological process 
is somewhat limiting and may be responsible for some 
of the features arising in the biological networks but not 
in our in silico networks. A likely candidate for this is 
the motif consisting of 2 mutually repressing molecular 
species: it is very frequent in the two yeast cell cycle net- 
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works but not in our ensembles of GRN. This absence 
occurs in spite of the fact that for other phenotypic con- 
straints, namely having fixed point expression patterns, 
our model does produce such motifs. Thus it would be of 
interest to see the impact on our results of forbidding in- 
teractions known to be unlikely on biochemical grounds, 
but such a study is beyond the scope of the present work. 
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